I will use data retail.csv to fulfill my LBB assignment (Learning By Building)
Importing data retail.csv from folder data_input. Because of the data format is .csv, so the function to import the data is using data.csv().
Preview 5 initial data:
Preview last data:
str()Checking data structure and brief information of data retail.csv using str().
## 'data.frame': 9994 obs. of 15 variables:
## $ Row.ID : int 1 2 3 4 5 6 7 8 9 10 ...
## $ Order.ID : chr "CA-2016-152156" "CA-2016-152156" "CA-2016-138688" "US-2015-108966" ...
## $ Order.Date : chr "11/8/16" "11/8/16" "6/12/16" "10/11/15" ...
## $ Ship.Date : chr "11/11/16" "11/11/16" "6/16/16" "10/18/15" ...
## $ Ship.Mode : chr "Second Class" "Second Class" "Second Class" "Standard Class" ...
## $ Customer.ID : chr "CG-12520" "CG-12520" "DV-13045" "SO-20335" ...
## $ Segment : chr "Consumer" "Consumer" "Corporate" "Consumer" ...
## $ Product.ID : chr "FUR-BO-10001798" "FUR-CH-10000454" "OFF-LA-10000240" "FUR-TA-10000577" ...
## $ Category : chr "Furniture" "Furniture" "Office Supplies" "Furniture" ...
## $ Sub.Category: chr "Bookcases" "Chairs" "Labels" "Tables" ...
## $ Product.Name: chr "Bush Somerset Collection Bookcase" "Hon Deluxe Fabric Upholstered Stacking Chairs, Rounded Back" "Self-Adhesive Address Labels for Typewriters by Universal" "Bretford CR4500 Series Slim Rectangular Table" ...
## $ Sales : num 262 731.9 14.6 957.6 22.4 ...
## $ Quantity : int 2 3 2 5 2 7 4 6 3 5 ...
## $ Discount : num 0 0 0 0.45 0.2 0 0 0.2 0.2 0 ...
## $ Profit : num 41.91 219.58 6.87 -383.03 2.52 ...
dim()If we want to get total row and column of a dataframe, we can use dim()
## [1] 9994 15
## [1] "Row.ID" "Order.ID" "Order.Date" "Ship.Date" "Ship.Mode"
## [6] "Customer.ID" "Segment" "Product.ID" "Category" "Sub.Category"
## [11] "Product.Name" "Sales" "Quantity" "Discount" "Profit"
library(dplyr)
library(parsedate)
library(lubridate)
retail <- retail %>%
mutate(Ship.Date=parse_date(Ship.Date),
Order.Date=parse_date(Order.Date),
Category=as.factor(Category),
Sub.Category=as.factor(Sub.Category),
Segment=as.factor(Segment),
Ship.Mode=as.factor(Ship.Mode),
shiporder_difftime = difftime(time1 = Ship.Date,
time2 = Order.Date,
units = "day"),
Year=lubridate::year(Order.Date),
Product.Name=as.factor(Product.Name))## Row.ID Order.ID Order.Date Ship.Date
## 0 0 0 0
## Ship.Mode Customer.ID Segment Product.ID
## 0 0 0 0
## Category Sub.Category Product.Name Sales
## 0 0 0 0
## Quantity Discount Profit shiporder_difftime
## 0 0 0 0
## Year
## 0
Conclusion: no null data from the dataset
## Row.ID Order.ID Order.Date
## Min. : 1 Length:9994 Min. :2014-01-03 00:00:00
## 1st Qu.:2499 Class :character 1st Qu.:2015-05-23 00:00:00
## Median :4998 Mode :character Median :2016-06-26 00:00:00
## Mean :4998 Mean :2016-04-30 00:07:12
## 3rd Qu.:7496 3rd Qu.:2017-05-14 00:00:00
## Max. :9994 Max. :2017-12-30 00:00:00
##
## Ship.Date Ship.Mode Customer.ID
## Min. :2014-01-07 00:00:00 First Class :1538 Length:9994
## 1st Qu.:2015-05-27 00:00:00 Same Day : 543 Class :character
## Median :2016-06-29 00:00:00 Second Class :1945 Mode :character
## Mean :2016-05-03 23:06:58 Standard Class:5968
## 3rd Qu.:2017-05-18 00:00:00
## Max. :2018-01-05 00:00:00
##
## Segment Product.ID Category
## Consumer :5191 Length:9994 Furniture :2121
## Corporate :3020 Class :character Office Supplies:6026
## Home Office:1783 Mode :character Technology :1847
##
##
##
##
## Sub.Category Product.Name Sales
## Binders :1523 Staple envelope : 48 Min. : 0.444
## Paper :1370 Easy-staple paper : 46 1st Qu.: 17.280
## Furnishings: 957 Staples : 46 Median : 54.490
## Phones : 889 Avery Non-Stick Binders : 20 Mean : 229.858
## Storage : 846 Staples in misc. colors : 19 3rd Qu.: 209.940
## Art : 796 KI Adjustable-Height Table: 18 Max. :22638.480
## (Other) :3613 (Other) :9797
## Quantity Discount Profit shiporder_difftime
## Min. : 1.00 Min. :0.0000 Min. :-6599.978 Length:9994
## 1st Qu.: 2.00 1st Qu.:0.0000 1st Qu.: 1.729 Class :difftime
## Median : 3.00 Median :0.2000 Median : 8.666 Mode :numeric
## Mean : 3.79 Mean :0.1562 Mean : 28.657
## 3rd Qu.: 5.00 3rd Qu.:0.2000 3rd Qu.: 29.364
## Max. :14.00 Max. :0.8000 Max. : 8399.976
##
## Year
## Min. :2014
## 1st Qu.:2015
## Median :2016
## Mean :2016
## 3rd Qu.:2017
## Max. :2017
##
What product that give maximum profit within 4 years?
retail[ retail$Profit == max(retail$Profit),
c('Segment','Category','Sub.Category','Product.Name','Year', 'Sales','Quantity','Profit', 'shiporder_difftime')] Maximum profit 8399.976 in 4 years was earned by the company in 2016 from segment Technologies with product name ‘Canon imageCLASS 2200 Advanced Copier’.
What product that give minimum profit within 4 years?
retail[ retail$Profit == min(retail$Profit),
c('Segment','Category','Sub.Category', 'Product.Name','Year','Sales','Quantity','Profit', 'shiporder_difftime')] Minimum profit -6599.978 earned by the company in 2016 and from segment Technologies with product name ‘Cubify CubeX 3D Printer Double Head Print’
How much products in percentage that give negative profit within 4 years?
library(scales)
negative_profit <- retail[ retail$Profit < 0, c("Product.ID", "Product.Name", "Category", "Profit")]
percent(nrow(negative_profit)/nrow(retail))## [1] "19%"
81% from all transaction has brought profit for the company and 19% has negative profit.
How much total profit gained from all products within 4 years
## [1] 286397
In 4 years from January 2014 to December 2017, The company successfully has earned total profit 286397
profit_category <- retail %>%
group_by(Category) %>%
summarise(Total_Profit = sum(Profit))
profit_category <- profit_category %>%
mutate(remark=paste(Category,":",Total_Profit))
total_profit_cat <- profit_category %>%
ggplot(aes(x=reorder(Category,-Total_Profit),y=Total_Profit,text=remark)) +
geom_col(aes(fill = Category),show.legend = F) +
labs(title = "Total Profit Per Category",x="Category",y="Total Profit") +
theme_algoritma
ggplotly(total_profit_cat,tooltip = "text")profit_segment <- retail %>%
group_by(Segment) %>%
summarise(Total_Profit = sum(Profit))
profit_segment <- profit_segment %>%
mutate(remark=paste(Segment,":",Total_Profit))
total_profit_seg <- profit_segment %>%
ggplot(aes(x=reorder(Segment,-Total_Profit),y=Total_Profit,text=remark)) +
geom_col(aes(fill = Segment),show.legend = F) +
labs(title = "Total Profit Per Segment",x="Segment",y="Total Profit") +
theme_algoritma
ggplotly(total_profit_seg,tooltip = "text")profit_seg_cat <- retail %>%
group_by(Segment,Category) %>%
summarise(Total_Profit = sum(Profit))
profit_seg_cat <- profit_seg_cat %>%
mutate(remark=paste(Segment,"&",Category,":",Total_Profit))
total_profit_seg_cat <- profit_seg_cat %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=Segment,y=Total_Profit,fill=Category),position = "dodge") +
labs(title = "Total Profit Per Segment & Category",x="Segment",y="Total Profit") +
theme_algoritma
ggplotly(total_profit_seg_cat,tooltip = "text")The most profitable segment is Consumer and from category Technologies with total profit 70797.8096 Technologies is the highest profitable category from all segments, and Furniture is the smallest.
yearly_profit <- retail %>%
group_by(Year) %>%
summarise(Total_Profit = sum(Profit))
profit_fluctuation <- yearly_profit %>%
ggplot(aes(x= Year,y=Total_Profit))+
geom_line(linetype="dashed", color="blue",size=1.2,group=1)+geom_point(color="black", size=3)+
labs(title = "Profit Fluctuation per Year",x="Year", y="Total Profit")+
theme_algoritma
ggplotly(profit_fluctuation)items_sold_category <- retail %>%
group_by(Category) %>%
summarise(Total_Items_Sold = sum(Quantity))
items_sold_category <- items_sold_category %>%
mutate(remark=paste(Category,":",Total_Items_Sold))
total_items_sold <- items_sold_category %>%
ggplot(aes(x=reorder(Category,-Total_Items_Sold),y=Total_Items_Sold,text=remark)) +
geom_col(aes(fill = Category),show.legend = F) +
labs(title = "Total Items Sold Per Category",x="Category",y="Quantity") +
theme_algoritma
ggplotly(total_items_sold,tooltip = "text")## [1] 37873
In 4 years from January 2014 to December 2017, The company successfully has sold 37873 items
items_seg_cat <- retail %>%
group_by(Segment,Category) %>%
summarise(Quantity = sum(Quantity))
items_seg_cat <- items_seg_cat %>%
mutate(remark=paste(Segment,"&",Category,":",Quantity))
total_items_seg_cat <- items_seg_cat %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=Segment,y=Quantity,fill=Category),position = "dodge") +
labs(title = "Total Items Sold Per Segment & Category",x="Segment",y="Quantity") +
theme_algoritma
ggplotly(total_items_seg_cat,tooltip = "text")profit_seg_subcat <- retail %>%
group_by(Category,Sub.Category) %>%
summarise(Profit = sum(Profit))
profit_seg_subcat <- profit_seg_subcat %>%
mutate(remark=paste(Category,"&",Sub.Category,":",Profit))
total_profit_seg_subcat <- profit_seg_subcat %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=Category,y=Profit,fill=Sub.Category),position = "dodge") +
labs(title = "Total Profit Per Category & Sub.Category",x="Category",y="Profit") +
theme_algoritma
ggplotly(total_profit_seg_subcat,tooltip = "text")profit_tech_copiers <- retail %>%
filter(Category=="Technology" & Sub.Category=="Copiers") %>%
group_by(Product.Name) %>%
summarise(Profit = sum(Profit))
profit_tech_copiers <- profit_tech_copiers %>%
mutate(remark=paste(Product.Name,":",Profit))
total_profit_tech_copiers <- profit_tech_copiers %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=reorder(Product.Name,-Profit),y=Profit,fill=Product.Name),position = "dodge") +
labs(title = "Profit Technology and Copiers",y="Profit",x="")+
theme_algoritma + theme(axis.title.x = element_blank())
ggplotly(total_profit_tech_copiers,tooltip = "text")What is the type of Ship Mode that deliver fastest from order date to ship date?
shipment <- retail %>%
group_by(Ship.Mode) %>%
summarise(avg_shipment = mean(shiporder_difftime))
shipment <- shipment %>%
mutate(remark=paste(Ship.Mode,":",avg_shipment))
avg_shipment <- shipment %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=reorder(Ship.Mode,-avg_shipment),y=avg_shipment,fill=Ship.Mode),position = "dodge") +
labs(title = "Average Shipment",x="Segment",y="Average Shipment") +
theme_algoritma
ggplotly(avg_shipment,tooltip = "text")discount <- retail %>%
group_by(Category) %>%
summarise(avg_disc = mean(Discount))
discount <- discount %>%
mutate(remark=paste(Category,":",avg_disc))
plot_avg_disc <- discount %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=reorder(Category,-avg_disc),y=avg_disc,fill=Category),position = "dodge") +
labs(title = "Average Discount",x="Category",y="Average Discount") +
theme_algoritma
ggplotly(plot_avg_disc,tooltip = "text")ship_discount <- retail %>%
group_by(Category,Ship.Mode) %>%
summarise(avg_disc = mean(Discount))
ship_discount <- ship_discount %>%
mutate(remark=paste(Ship.Mode,":",avg_disc))
plot_avg_ship_disc <- ship_discount %>%
ggplot(aes(text=remark)) +
geom_col(aes(x=reorder(Ship.Mode,-avg_disc),y=avg_disc,fill=Category),position = "dodge") +
labs(title = "Average Discount by Ship.Mode and Category",x="Ship.Mode",y="Average Discount") +
theme_algoritma
ggplotly(plot_avg_ship_disc,tooltip = "text")In 4 years from January 2014 to December 2017, The company successfully has earned total profit 286397 from 37873 items sold. The company’s total profit always growing every year. 81% from all transaction has brought profit for the company and 19% has negative profit. They are selling 3 categories such as Furniture, Office Supplies, and Technologies. The company is also eyeing 3 segments in the market which consists of Consumer, Corporate, and Home Office.
Technologies is one of the most profitable categories with total profit in 4 years is 145454.9481, eventhough Technology is actually has the smallest amount of quantity sold by the company. Maximum profit 8399.976 in 4 years was earned by the company in 2016 from segment Technologies with product name ‘Canon imageCLASS 2200 Advanced Copier’, but minimum profit -6599.978 was also earned by the company in the same year 2016 and from same segment Technologies with product name ‘Cubify CubeX 3D Printer Double Head Print’. The most profitable segment is Consumer and from category Technologies with total profit 70797.8096. Technologies category has 4 sub.categories consist of Accessories, Copiers, Machines, and Phones. The most profitable sub.categories is Copiers and the least is Machines, if we see from Technology category only. But from overall, the least sub. category profit is -17725.48 earned from Furniture and Tables. The company actually sold 13 types of Copiers, but the most profitable copiers are Canon imageCLASS 2200 Advanced Copier, Hewlett Packard LaserJet 3310 Copier, and Canon PC1060 Personal Laser Copier.
Consumer is the most profitable segment in the market for the company with total profit is 134119.2092. Eventhough the most items they like to buy is Office Supplies category, but the most profitable category from segment consumer is not coming from Office Supplies, but still coming from Technology.
The company offer 4 ways of shipment such as Same Day, First Class, Second Class, and Standard Class. The fastest Ship Mode is Same Day with average 0.04 day or only around 1 hour. The slowest Ship Mode is Standard Class with average 5 days of shipment.
The most category that like to give discount is Furniture with average discount 17% and Technology is the smallest with average discount 13%. If we break down again Furniture category with its shipment methods, Furniture category with ship.mode Same Day is the highest type that like to give discount with average discount 18.6%.